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ABSTRACT 


The objective of this study is to examine a range of research studies conducted between 2017 until 2023 that 
focus on the analysis of Electrical Load Profiles (ELPs) using clustering algorithms within a machine learning 
framework. The methodology used in this research is Preferred Reporting for Systematic Review and Meta- 
analysis (PRISMA) framework. According to this study, it was discovered that the process of formation using 
the clustering algorithm can be categorized into two distinct approaches. The first approach involves the 
utilization of a specified number of clusters, while the second approach does not necessitate the explicit 
determination of the number of clusters. Additionally, it has been observed that the method employed to 
determine the number of clusters has a significant impact on the performance and quality of clustering, as it 
influences the features involved. This study explores various aspects related to clustering, including 
techniques for measuring the distance between data points, strategies for initializing cluster centers, 
approaches for reducing the dimensions of initial data, and methods for identifying and addressing outliers. 
The findings of this study offer insights into the various technological obstacles and emerging patterns in the 
analysis of ELPs, as well as investigate potential prospects for the future. 
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1. INTRODUCTION 15, or 30 minutes [5]. Research on ELPs is a crucial 
focus required by utility companies for formulating 
strategic steps in running various business processes, 
including price and tariff planning, distribution 
network operation planning, electricity production 
planning, load management, customer service, and 
public authorities. Additionally, it can be employed 
to identify energy consumption patterns, enabling 
the forecasting of future energy demand, designing 
energy efficiency programs, and planning power grid 
capacity [6], [7] . To achieve these goals, clustering 
methods can be employed as effective tools for 
identifying patterns and trends within large and 
complex ELPs data. This helps in the analysis and 
understanding of load profiles [8] . The analysis of 
ELPs also presents numerous potentially beneficial 
opportunities across various aspects of energy 
consumption systems [9], [10] including electricity 
usage behavior, electrical energy pricing, and 
forecasting energy demand management for the 
future [11]. 


The energy sector has embraced the big data 
trend, as evidenced by the growing interest of 
researchers in gathering and analyzing energy data 
[1], [2]. This shift towards big data analytics for 
flexible energy sharing signifies a significant 
evolution in how the industry approaches data, 
emphasizing the importance of comprehensive data 
collection and advanced analytics for more effective 
energy management. Energy consumption data can 
be obtained from several sources such as Energy 
Meters that use recording devices on energy meters. 
Automatic Meter Reading (AMR) is a tool that can 
be used to record electrical energy consumption data 
in medium-voltage and high-voltage electricity 
groups [3] and Energy Meter using GSM in low- 
voltage electricity groups [4]. The results of AMR 
recording in the electrical system have the potential 
to produce Electrical Load Profiles (ELPs) records. 
ELPs are real-time data on electrical energy usage 
that can be recorded from electricity meter data In the context of machine learning, clustering is 
installed in electricity customer buildings every 10, often utilized in exploratory data analysis, where the 
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process involves learning structures and patterns 
existing in data without prior class labels. Therefore, 
clustering is typically categorized as an 
‘unsupervised learning’ method in machine learning 
[8], [12], [13]. Applications of clustering methods in 
machine learning encompass various fields, 
including customer segmentation, document 
grouping, anomalous detection, gene or protein 
grouping, image grouping, superconducting 
clustering, recommendation systems, big data 
analysis [14] and even specifically in the analysis of 
energy load profiles [6], [7], [15]. 


Considering the theme of load profile analysis, 
this work focuses on observing an enhanced 
clustering approach. For instance, the FCM 
improvement performed by Mingyang (2020) 
modifies the objective function by incorporating 
cluster volumes to overcome the effects of 
distribution imbalances in the data [16]. In contrast, 
Qaiyum et al. (2019) utilized Ant Colony 
Optimization (ACO) and Maximum Residual 
Sampling (MRS) techniques to accelerate time, 
reduce space, and address time complexity issues in 
big data dimensioning problems related to clustering 
algorithm [17] s. The subsequent challenge was to 
define clustering methods for complex load profile 
data analysis and determine appropriate techniques 
to enhance clustering methods in terms of efficiency 
and accuracy. 


A compilation of findings from pertinent 
previous studies can aid in identifying current 
research trends. This identification process can be 
accomplished through a Systematic Literature 
Review (SLR), a tool that encourages researchers to 
investigate their research subject using a broad 
search strategy, predetermined search terms, and 
straightforward inclusion and exclusion criteria. 
Theoretically, employing SLR increases the 
likelihood of obtaining clearer and more objective 
research answers [18]. 


The purpose of this study is to conduct a 
Systematic Literature Review (SLR) on the subject 
of Energy Load Profile research, evaluate various 
clustering methods concerning different problems, 
and identify potential gaps in the existing literature 
between 2017 and 2023 that focus on the analysis of 
Electrical Load Profiles (ELPs) using clustering 
algorithms within a machine learning framework. 
For this purpose, the research database utilized is a 
Scopus-indexed database with Quartile 1 to Quartile 
4, which is considered an important database of 
papers with reviewed publications. Scopus offers a 
comprehensive overview of global research in 


diverse and impactful subject areas in the scientific 
journals within the academic community [19]. 


The need for this research is to examine and 
evaluate various clustering algorithms that have been 
applied in Load Profile analysis so that the 
performance of each clustering algorithm that has 
been used can be easily understood and identify 
potential gaps that can be filled in this literature such 
as challenges in improving the performance of 
clustering methods. in the analysis of ELPs. Our 
contribution in this research primarily addresses the 
latest developments in the research field, clustering 
techniques, and the focus of studies for each 
application. To achieve this contribution, we 
modified our review procedure to emphasize the 
engineering details of each selected study rather than 
the results of each paper. Considering the objectives 
and constraints of its application, this study will 
significantly assist researchers in identifying the 
most prominent domains of ELPs analysis, as well as 
the most popular clustering methods. 


2. RESEARCH METHODOLOGY 
The method employed in this study is 

Systematic Literature Review (SLR). SLR is a 
technique for managing information sources related 
to a predetermined topic [20]. In this study, SLR was 
utilized to assess various clustering methods, their 
associated challenges, and to identify potential gaps 
in Load Profiling Analysis. Relevant research was 
searched using the term "Load Profile," present in the 
title, abstract, and keywords of articles, with the 
analysis method being "Clustering". 

The SLR method employed in this study follows 
the Preferred Reporting Items for Systematic 
Reviews and Meta-Analyses (PRISMA) guidelines. 
PRISMA is utilized to select reporting elements for 
systematic reviews and meta-analyses, providing a 
tool to evaluate the reliability of publications for 
systematic or literature reviews [20]. The following 
steps outline the preparation of checklist items for 
systematic review and meta-analysis: 

1. Title: Identifies systematic reviews and meta- 
analyses. 

2. Structured abstract, comprising several parts, 
namely introduction, materials and methods, 
results, and conclusions. 

3. Introduction: Addresses the urgency of 
systematic review, outlines objectives, and 
introduces the meta-analysis of the systematic 
review 

4. The literature search method is executed by 
exploring literature portal sources based on 
research questions in the scientific article 
database. This involves eliminating data 
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duplication, applying inclusion and exclusion 7. The conclusion summarizes the findings and 


criteria, and selecting appropriate papers. provides recommendations 
5. The results are presented with an overview 
diagram illustrating the article selection 
process. 
6. The discussion section outlines the limitations 
and gaps in previous studies. 


for future research. 


The flow of PRISMA implemented in this 
study can be observed in Figure 1: 


Identification of studies via databases and registers peace Ns ie other 
methods 
5 Record Identification By Record removed before screening 
: Keyword Duplicate records removed (n=95) 
È (Scopus / Science Direct / IEEE / Records mark as ineigible [Year 2017-2023] (n=278) 
E f Record removed for other reasons [Tier Q1,Q2,Q3,Q4] 
Bi MDPI / Springer) (1=12) 
Ae) 2 F 
2 eas Record without abstract for screening (n=3) 
ý 
Record Screened Record Exclude 
n=536 n=270 
y 
Record Screened on the basis of Record Exclude 
title ~ > n= 175 ‘ 
n=266 
2 
3 Y 
5 Record Exclude for some reason : 
A Piai 
Record Screened on the basis of > a i ou en 
Abstract | y - Irrelevant journal conten 
=9] 3. Use certain unreadable typeface text 
7 (Japanese, Chinese and Korean letters) 
n=41 
Study Include from 
Scopus / ete / IEEE / Additional Record 
Year 2017-2023 \¢ Identified From Pe Sources 
Tier Q1,Q2,Q3,Q4 = 
n=50 
3 Total studies included in f Panna Repon , 
ag] ‘ Identified From Other Sources (in 
3 Metaanalysis Wass SSeS Se A | ; 
e] Review) 
fz! n=52 n=? 


Figure I Process of Literature Review with PRISMA 
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2.1 Literature Review 

The parameters used in paper selection are 
explained based on the exclusion criteria employed 
for document screening. The objectives related to the 
analytical components of this Systematic Literature 
Review (SLR) are carried out through the stages of 
identification, screening (excluded), and inclusion 
(included). SLR is conducted by systematically 
discovering, critically analyzing, and broadly 
interpreting relevant and applicable research 
findings, as shown in Table 1 and Fig. 1 illustrates 
the application of the first filter in the paper selection 
process on a chosen database, resulting in a total of 
52 papers 
Table 1 Observation of Literature Review 


Observation Literature Review 

Research 1. What clustering methods are often 
Question used in the analysis of ELPs? 

(RQ) 2. What features are needed to improve 


the performance of 
methods? 

3. What evaluation methods are often 
used to assess the performance of 
clustering techniques? 


clustering 


Literature 1. Journal Publication, Review Paper, 
Selection Original Papers 
2. The Publication Period 2017 — 2023 
3. Potentially answer research 
questions 
4. The Scopus database contains 
indexed publications, including 


information such as title, affiliation, 
year published, source, abstract, and 
quartile rank (Q1-Q4 and Non-Q). 

5. The focus of the literature is on 
clustering methods for load profiling 
analysis based on machine learning 
trends in the field of clustering. 

6. Publications are written in alphabet 
and in English 


Literature Scopus, Science Direct, IEEE Xplore, 

Source Multidisciplinary Digital Publishing 
Institute (MDPI), Springer 

Keywords Load profile AND (Clustering analysis 


OR Clustering Method OR Data 
clustering OR Electricity load OR 
Pattern recognition OR Unsupervised 
learning OR Supervised learning OR 
Data mining OR Machine learning OR 
Feature extraction OR Data 
preprocessing OR Validation Clustering 
OR Optimal cluster number OR Time 
series analysis OR K-Means OR Fuzzy 
C-Mean OR Fuzzy — Subtractive 
Clustering OR Self Organizing Map OR 
Hierarchical Clustering OR eXplainable 
Artificial Intelligence OR Artificial 
Intelligence) 


Table I above explains the research questions, 
literature selection, literature sources, and keywords 
used, encompassing the phases involved in 
conducting the Systematic Literature Review (SLR) 
as outlined in the PRISMA stages in Figure 1 


2.1.1 Identification 

Based on the search keywords used in the 
literature sources employed for this SLR (Scopus, 
Science Direct, IEEE Xplore, Multidisciplinary 
Digital Publishing Institute - MDPI, and Springer), a 
total of 924 papers were initially identified. Before 
the screening process, 95 duplicate papers were 
removed. Additionally, 278 papers were deemed 
ineligible as they fell outside the timeframe of 2017- 
2023, 12 papers lacked a Quartile rating of 1 to Q4, 
and 3 papers without abstracts were excluded prior 
to the screening process. 


2.1.2 Screening 

Following the identification and examination 
process, a total of 536 papers were acquired. 
Subsequently, the screening of these papers involved 
a selection process based on titles related to 
improving clustering methods and load profile 
analysis, resulting in a total of 266 papers. Further 
screening was conducted by selecting papers based 
on abstracts, yielding 91 papers that met the 
inclusion criteria. A more thorough screening 
process involved careful reading of the full text of 
these 91 papers. Out of these, 41 papers were 
included in the exclusion criteria due to incomplete 
text, irrelevant journal content, or content in 
Japanese, Chinese, and Korean letters that could not 
be read. Consequently, 50 papers were obtained that 
aligned with the research objectives. However, 12 
papers lacked a Quartile rating. They were carefully 
reviewed to extract relevant information, leading to 
the identification of 2 non-Q journals that could be 
utilized for this study. 


2.1.3 Included 

Following the screening process, a total of 52 
articles were identified for further analysis through 
meta-analysis. The objective is to examine data 
patterns and research trends. 


2.2 Research trends in Electricity Load Profile 

The examination of the dataset, which 
encompasses 52 scientific articles published 
between 2017 and 2023, reveals the distribution of 
papers over the research timeline, as visually 
depicted in Figure 2: 
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Publication Year 


A> 


Figure 2 Literature Distribution from 2017 until 2023 


Figure 3 and Figure 4 illustrate the distribution of 
scientific papers categorized by their reputation at 
the quartile level. The data shown shows that 63% of 
the papers included in the analysis fall into the Q1 
category. In the context of quartile ranking, Q1 often 
includes high-quality papers that have made 
significant contributions to the scientific literature or 
have relevance in a particular study domain [21]. To 
clarify, it can be said that more than 50% of the 
articles examined are classified in the Q1 group. 


Article By Quartile 
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= Q4 
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Figure 3 The Distribution of Literature Based on Quartile 
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Figure 4 Distribution Literature Based On Year and 
Quartile 


Clustering is an approach used in unsupervised 
machine learning models [14], [22], [23]. A 
literature review focusing on ELPs analysis using 
machine learning and artificial intelligent 
approaches has been conducted based on several 
existing papers. Table II presents a comparative 
analysis of surveys similar to those conducted in this 
study, aiming to assess the uniqueness and 
originality of the current research. 


Table 2 Comparison With Comparable Surveys In The 
Existing Literature 


Ref. Research Material The Present 
Study 
Cembr | This paper examines | This study 
anel, various data mining | analyzes 
2019 techniques used in | clustering 
[13] clustering electricity | techniques 
customers. The main | commonly used in 
emphasis lies on the | ELPs research 
process of knowledge | and looks for 
discovery in databases | opportunities to 
(KDD), which | improve the 
includes various | effectiveness of 
stages including data | certain clustering 
selection, pre- | methods in load 
processing, data | profile analysis. 
mining, evaluation, 
and application of 
knowledge. 
Kewo, | This paper reviews | This study 
2023 residential electrical | analyzes what 
[24] load profiles, | features are 
identifying current | needed to 
methods for modeling | improve the 
such profiles. The | performance of 
study also addresses | clustering 
the advantages and | methods in ELPs 
disadvantages of | analysis. 
various approaches, 
focusing on data 
characteristics, 
validation, and quality 
scores. Most of the 
research focuses on 
load profile 
development and load 
disaggregation. 
Ramok | The paper discusses | This study 
e,2021 | the advantages and | analyzes the 
[25] disadvantages of AI- | methods that can 
based models and | be used to 
compares them with | evaluate 
conventional non-AI- | clustering 
based models to | methods 
determine energy 
consumption patterns 
by time series data 
analysis. 
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Several studies have explored the application 
of clustering methods with a machine learning 
approach on Energy Load Profiles (ELPs). However, 
the aforementioned study does not specifically 
investigate the use of ELPs data in various tariff 
categories, including Household, Business, 
Industrial, Social, and Public. Conducting research 
studies on ELPs using clustering with a machine 
learning approach within each tariff group— 
Household, Business, Industrial, Social, and 
Public—offers new opportunities for ELPs research 
analysis. 


Identified potential research gaps in employing a 
machine learning approach to clustering methods for 
ELPs analysis include: 


1. Challenges in reducing dimensions in ELPs 
data. 

2. Challenges in determining which clustering 
methods can be improved for ELPs analysis. 


3. Challenges in improving the performance of 
clustering methods in ELPs analysis. 

4. Challenges in determining the optimal number 
of clusters in clustering analysis. 

5. Challenge of interpreting improved clustering 
performance 


The novelty of this research lies in addressing 
new opportunities for the advancement of clustering 
methods in Energy Load Profiles (ELPs) analysis, 
aiming for improved optimality, efficiency, and 
accuracy. Trends in load profiling analysis continue 
to evolve, driven by technological advancements, 
increasingly complex energy demands [24], and a 
growing emphasis on energy efficiency and 
renewable energy [26]. These aspects offer 
significant benefits across various contexts, 
particularly within the energy industry and energy 
resource management. Therefore, Fig. 4 provides a 
detailed overview of publications mentioning terms 
related to data analysis in the energy domain. 


Research Term 
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Figure 5 Number Of Articles In The Sample That Mention A Particular Research Term 


Figure 5 displays the number of entities in the energy 
industry, as confirmed through scientific research 
that has been carried. The of discussion in the 
selected article is the load profile, which is indicated 
by the frequent occurrence of the term 


2.3 Clustering Research In Load Profile 

In this section, we provide a comprehensive 
review of selected articles, evaluating them based on 
their year of publication, keyword appearance, and 


relevance of reference journals in answering 
research questions.In the context of clustering 
procedures, data is arranged or partitioned based on 
similarities and differences. These techniques are 
sometimes referred to as unsupervised learning 
approaches [13]. Unsupervised learning techniques 
rely solely on information attached to data to group 
data sets. Table 3 below describes the clustering 
methods used for ELPs analysis. 
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Table 3 Studies Of The Use Of Machine Learning And AI To Enhance Clustering Method In Load Profiling Analysis 


Evaluation 


Author (Citation) Clustering Method Application Feature Extraction Method 

Shi, 2020 [15] K-Means Load Profile PCA SIL, CHI, DBI 
Yilmaz, 2019 [27] K-Means Load Profile - SIL 

Duarte, 2022 [9] K-Means Load Profile PCA - 

Choksi, 2020 [6] K-Means, SOM Load Profile - - 

Zang, 2023 [28] FCM Consumer Power - SIL, DBI, CHI, 


Load Data IN 
Power Load s 


Bian, 2023 [29] FCM, 


Spectral Clust 


Zhao, 2023 [30] Spectral Clust Power Load - 

Cen, 2022 [31] K-Means, Load Profile PCA SIL, SSE 
Hierarchical Clust, 
FCM 

Jessen, 2022 [32] K-Means Load Profile - - 

Kim, 2022 [11] K-Menas Electrical Load MultiTag2 Vec - 

Liu, 2021 [16] FCM Load Profile - = 

Unal, 2021 [33] DBSCAN Load Profile MDS RMSSTD Idx 

Jeong, 2021 [34] K-Means Load Profile PCA - 

Valdes, 2021 [35] Time Series Electrical Load - - 
Clustering 

Jain, 2021 [36] K-Means, Electrical Load PCA, FA SIL, CHI, DI, 
FCM DBI, XB 

Wang, 2022 [37] GMM Load Profile PCA - 

Ruhang, 2020 [38] Agglomerative Electrical Load Overlapping Sliding - 
Clustering Window 

Zang, 2020 [39 K-Means Load Profile SVD - 

Lin, 2019 [7] Spectral Clust Load Profile PPA - 

Lin, 2019 [10] Evolutionary Clust Load Profile - SQ, HQ 

Vahedi, 2023 [40] Auto Clustering Load Profile TUBERCULOSIS - 

Binh, 2018 [41] K-Means, SC, SOM Electricity - - 

Consumption 

Ray, 2019 [2] Adaptive Clust Load Profile - NMI 

Damayanti, 2017 [5] K-Means, FCM, Load Profile - DBI 
KHM 

Yang, 2022 [42] K-Means Electrical Load LSTM-AE SIL 

Eiraudo, 2023 [43] K-Means, Load Profiles - WCSS, SIL, 

DBI, CHI 

Senen, 2023 [44] FCM Load Profile - SIL 

Jain, 2022 [45] K-Means, FCM, Load Profile t-SNE CVIs, CH 
Agglomerative 
Clustering 

Flor, 2021 [46] K-Means Load Profile - SIL 


Mares, 2020 [47 Hierarchical Clust Load Profile 
Kim, 2018 [48] MSC Load Profile - 
Llanos, 2017 [49] SOM Load Profile - 
Acronyms (Alphabetical) : CHI (CalinskiN Harabasz Index); CNN (Convolutional Neural Network); CVIs (Cluster Validation 
Indices); DBI (Davies Bouldin Index); DBSCAN (Density Based Spatial Clustering of Applications with Noise); DI (Dunn Index); 
FA (Factor Analysis); FCM (Fuzzy C-Means); GMM (Gaussian Mixture Model); History Quality (HQ); Scattering Density Index 
(ISD) KHM (K-Harmonic Means); LSTM-AE (Long Short Term Memory AutoEncoder); MDS (Multidimensional Scaling); 
MultiTag2 Vec (Multidimensional Tag to Vector); MSC (Mean Shift Clustering); NMI (Normalized Mutual Information), PCA 
(Principal Component Analysis); PAA (Piecewise Aggregate Approximation); RMSSTD Idx (Root Mean Square Standard Deviation 
Index); SC (Subtractive Clustering), SIL (Silhouette Index); SOM (Self-Organizing Maping); SSE (Sum of Squared Errors); SVD 
(Singular Value Decomposition); SQ (Snapshot Quality), TB (Temperature Based Clustering); t-SNE (t-ditribution Stochastic 
Neighborhood Embedding; VMD (Variational Mode Decomposition); WCSS (Within Cluster Sum of Square); XB (Xie and Beni 
Index). 
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2.4 Features To Improve The Performance Of 
Clustering Methods 
The determination of the number of clusters, data 
normalization, selection of distance functions 
between data points, initialization of cluster centers, 
dimension reduction, and handling of outliers are 


highly influential in enhancing clustering 
performance. Table IV below outlines the clustering 
methods detailed in Table III and the features 
utilized to improve clustering performance [12], 
[14], [50]. 


Table 4. Features That Influence The Formation And Improvement Of A Cluster 


Clustering Cluster Number Method 
Method Method Distance 
Function 


Cluster Center Outlier References 
Initiation detection and 
Handling 
Methods 


K-Means Elbow Euclidean 


Random Based - 
on Indicator 


Random Deletion outliers | Shi, 2020 [15] 
(K-Means++) that surpass the 
variables' 
standard 
deviation. 

- Duarte, 2022 [9] 


Gap Evaluation Euclidean 


Cluster Point - Choksi K, 2020 


distance, Average [6] 
Manhattan 
distance, and 
Cosine similarity 
DBSCAN Euclidean, - DBSCAN Roter, 2022 [14] 
Cityblock, 
Cosine, 
Chebychev, 
Elbow & Euclidean Random Deletion, Shi, 2021 [23] 
Silhouette (K-Means++) Imputation, 
Transformation 
Random Based Euclidean Cluster Point - Chowdhury, 
on Entrophy Average 2021 [51] 
Elbow and Euclidean Cluster Point DBSCAN Nguyen, 2020 
Silhouette Average [52 
Elbow Euclidean Cluster Point Deletion outliers | Liang, 2020 [53] 
Average 
- Euclidean Cluster Point Deletion outliers | Jain, 2021 [36] 
Average 
Elbow and Euclidean Cluster Point - Pooya, 2021 [54] 
Silhouette Average 
Elbow Euclidean Cluster Point - Jeong, 2021 [34] 
Average 
Elbow and Euclidean Cluster Point - Kim, 2022 [11] 
Silhouette Average 
Gap Evaluation Euclidean - - Xie, 2022 [22] 
Index Validity Inter Class KICIC - Zang, 2020 [39] 
distance 
- Euclidean SC - Binh, 2018 [41] 
Elbow and Euclidean Cluster Point - Cen, 2022 [31] 
Calinski- Average 
Harabasz scores 
Elbow Euclidean - - Jessen, 2022 [32] 
- Euclidean - Deletion outliers | Yang, 2022 [42] 
Elbow and Gap Euclidean Cluster Point - Jain, 2022 [45] 
Statistic Average 
- DTW - - Flor, 2021 [46] 
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Clustering Cluster Number Method Cluster Center Outlier References 
Method Method Distance Initiation detection and 
Function Handling 
Methods 
SOM Gap Evaluation Euclidean, Cluster Point Median Value, Choksi, 2020 [6] 
Manhattan Average Statistic Method 
and Deletion 
- Euclidean Cluster Point - Llanos, 2017 
Average [49] 
FCM - Non-Euclidean Cluster Point Non-Euclidean Hashemzadeh, 
distance Metrics | Average distance Metrics | 2019 [2] 
Index Validity Euclidean Find Primary - Sing, 2019 [55] 
Value 
- - Cluster Point - Jain, 2021 [36] 
Average 


Index Validity DTW Find Primary - Liu, 2021[16] 
Value 
Index Validity Hyperbolic Cluster Point - PeerJ, 2018 [56] 
Correlation Average 
based Distance 
Elbow &; Euclidean Cluster Point - Cen, 2022 [31] 
Calinski Average 
Harabasz Score 
Index Validity - - - Nguyen, 2022 
[57] 
Elbow, Euclidean, Cluster Point Median Value Bian, 2023 [29] 
Silhouette, Gap Manhattan, Average &; Deletion 
Evaluation Cosine 
Similarity 
- Euclidean - - Amane, 2023 
[58] 
Automatically - - - Mola, 2021 [59] 
based on data 
density 
Silhouette Euclidean Find Primary - Senen, 2023 [44] 
Value 
FPC Euclidean Cluster Point - Jain, 2022 [45] 
Average 
Spectral Clust | Matrix A multi-scale Eigenvectors - Lin, 2019 [7] 
Perturbation similarity metric 
Theory consisting of 
Euclidean 
distance, shape 
fluctuation, and 
shape trend 
Hierarchical | Cut Debdogram | Single Linkage There is no - Roter, 2022 [14] 
Clust concept of 
"cluster center" 
The data is Single linkage, There is no - Li, 2020 [53] 
clustered and the | complete concept of 
two most similar | linkage, average | "cluster center" 
clusters are linkage, and 
combined using Ward's method 
distance 
Elbow &; Ward Linkage There is no - Cen, 2022 [31] 
Calinski concept of 
Harabasz "cluster center" 
DBSCAN Does not Euclidean Density Based Data points Unal, 2021 [33] 
Require a outside the main 
Predetermined cluster & 
Deletion 
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Clustering Cluster Number Method Cluster Center Outlier References 
Method Method Distance Initiation detection and 
Function Handling 
Methods 
Number of 
Clusters 
Does not Data point Density Based Data points Chen, 2021 [60] 
Require a density outside the main 
Predetermined cluster & 
Number of Deletion 
Clusters 
Does not Data point Has MinPts - Guan, 2019 [61] 
Require a density within maximum 
Predetermined epsilon distance 
Number of 
Clusters 
Does not Euclidean, Average or Distant points Roter, 2022 [14] 
Require a Cityblock, Median of from the cluster, 
Predetermined Cosine, cluster points separated, and 
Number of Chebychev Deletion 
Clusters 
Time Series - - - - Valdes, 2021 
Clustering [35] 
GMM BIC - Data Points are Wang, 2022 [37] 
Assigned to the 
cluster with the 
Highest 
Probability 
Agglomerative | Try a certain Cosine Embedding - Ruhang, 2020 
Clustering cluster count Similarity Vector Average [38] 
Dendrogram and | Single Linkage, There is no Jain, 2022 [45] 
cut it off at a Complete concept of 
certain point Linkage, and "cluster center" 
Average Linkage 
Evolutionary | - - One data point - Lin, 2019 [10] 
Clust represents each 
potential group 
SC Does not - - Binh, 2018 [41] 
Require a 
Predetermined 
Number of 
Clusters 
Does not Radius Value Data point with - Dhanalakshmi, 
Require a highest density 2019 [62] 
Predetermined 
Number of 
Clusters 
Does not PSO Data point with - False, 2019 [63] 
Require a highest density 
Predetermined 
Number of 
Clusters 
Does not Radius Value Data point with - Zeng, 2019 [64] 
Require a highest density 
Predetermined 
Number of 
Clusters 
- - Data point with - Abdolkarimi, 
highest density 2020 [65] 
Does not Radius Value Data point with - Mola, 2021 [59] 
Require a highest density 
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Clustering Cluster Number Method Cluster Center Outlier References 
Method Method Distance Initiation detection and 
Function Handling 
Methods 
Predetermined 
Number of 
Clusters 
Adaptive Probabilistic DTW Average Load in | - Le, 2019 [2] 
Clust each cluster 
KHM Index Validity Euclidean Cluster Point - Damayanti, 2017 
Average [5] 
MSC Does not SPPC Average Load in | - Kim, 2018 [48] 
Require a each cluster 
Predetermined 
Number of 
Clusters 


3. RESULT AND DISCUSSION 

Based on a comprehensive analysis conducted on 
a total of 52 major studies published between 2017 
and 2023, it is revealed that 50 papers are classified 
into Quartile 1 to 4, while the remaining 2 papers are 
categorized as non-Q. This review has yielded 
several important findings. Table III offers a 
comprehensive overview of clustering methods used 
in the analysis of Energy Load Profiles (ELPs), 
including evaluation methods utilized in the last six 
years. Table IV provides a summary of the clustering 
methods used, highlighting influential features that 
contributed to the development of these methods. 
Furthermore, data related to each research question 
is compiled and presented in Figures 5 and 6. 


3.1 RQ 1: Clustering methods for the analysis of 
ELPs 


Based on the information presented in Table IV, 
it was found that the formation of the number of 
clusters can be categorized into two different 
approaches: an approach that involves a 
predetermined number of clusters and another 
approach that does not require an explicit 
determination of the number of clusters. Its 
distribution is shown in Fig. 6-7 below 


Method for determining the number of 
clusters 


12% 


h 


a Specified 
a Unspecified 


Figure 6 Method for determining the number of clusters 


K-Means 


SOM 


FCM 


Specified number of 
clusters (88%) 


Spectral Clustering 


GMM 


Agglomerative 
Clustering 


Clustering Method 


Z 


Hierarchical 
Clustering 


DBSCAN 


Unspecified number 
of clusters (12%) 


Subtractive Clustering 


MSC 


Figure 7 Distribution of clustering methods based on the 
method of forming the number of clusters 


The primary approach (88%) to analyzing 
Energy Load Profiles (ELPs) involves utilizing a 
specified number of clusters. This approach 
primarily includes the application of the K-Means 
method (38%), followed by the FCM clustering 
method (19%). Spectral Clustering and SOM each 
account for a proportion of 7%, while 
Agglomerative Clustering comprises 5%. Adaptive 
Clustering, Evolutionary Clustering, GMM, KHM, 
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and Time Series Clustering each contribute with a 
proportion of 2%. The smaller proportion of 
approaches (12%) does not require the explicit 
determination of cluster numbers and consists of 
Subtractive Clustering, DBSCAN, Hierarchical 
Clustering, and MSC methods. 


3.2 RQ2 Features required to 
clustering method performance 


improve 


As detailed in Table IV, the most influential 
features affecting clustering performance and quality 
are determined by the methods used for the 
following: determining the number of clusters, 
measuring distances between data points, initializing 
cluster centers, reducing dimensionality in initial 
data, and detecting and handling outliers [12], [14]. 
This review establishes that the Elbow Method is the 
most widely adopted approach (81%) in cases 
involving a predefined number of clusters. For 
approaches not requiring explicit determination of 
the number of clusters, results heavily depend on the 
method of determining distances between data 
points. The methods used include Radius Value 
(25%), Euclidean Distance (17%), Data point 
density (17%), and contributions of 8% each from 
Cosine, Cityblock, Chebychev, PSO 


To attain optimal clustering results, a 
preprocessing procedure is necessary before data 
analysis. Data preprocessing incorporates various 
techniques such as feature extraction, outlier 
handling, dealing with missing values, and data 
normalization [9], [40]. The PCA method stands out 
as the most widely utilized technique (41%) for 
feature extraction in data analysis by clustering. 
Following this, the PAA method is utilized at 12%, 
while MultiTag2Vec, MDS, Overlapping Sliding 
Window, FA, SVD, TBC, LSTM-AE, and t-SNE 
each contribute 6%. Although the topic of outlier 
detection is not extensively discussed in most 
papers, 25% of the reviewed papers providing 
information about outliers suggest identifying 
outliers by considering data points outside the main 
cluster and subsequently removing them [14], [33], 
[60]. 


3.3 RQ 3: Evaluation methods used to assess 

clustering performance 

The use of evaluation methods is crucial for 
assessing clustering performance, providing an 
objective means to determine the quality of the 
obtained cluster results [66], [67], [68]. Based on the 
conducted review, the most frequently employed 
cluster evaluation method for load profiling analysis 
is the Silhouette Index (29%), followed by the 
Davies Bouldin Index method (19%), Calinski 


Harabasz Index (16%), Dunn Index (6%), and other 
indices such as Cluster Validation Indices, History 
Quality, Scattering Density Index, Normalized 
Mutual Information, Root Mean Square Standard 
Deviation Index, Snapshot Quality, Sum of Squared 
Errors, Within Cluster Sum of Square, and Xie and 
Beni Index, each contributing 3%, as illustrated in 
Figure 8. 


The utilization of the clustering 
method evaluation technique 
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Figure 8 Distribution Clustering Evaluation Technique 
4. CONCLUSION 


The findings of this study contribute to the 
ongoing research on electrical load profiling analysis 
with clustering methods, building upon prior studies. 
A significant limitation of this review is the reliance 
on online repositories prioritized for literature search 
(Scopus, Science Direct, IEEE Xplore, 
Multidisciplinary Digital Publishing Institute - 
MDPI, Springer), with the use of additional 
keywords and synonyms potentially leading to 
further research. The implementation of Systematic 
Literature Review (SLR) with the PRISMA method 
yielded 52 articles from 2017 to 2023, discussing 
issues and techniques used to enhance clustering 
performance. 


The results, as depicted in Figure 3, reveal that 
the most involved and influential journals in this 
study are those with the highest influence, primarily 
QI at 63%, with the highest distribution in 2022, 
constituting 21% (7 of the 33 Q1 journals were 
published in in 2022), as illustrated in Fig. 4. The 
distribution of clustering methods, based on the 
method of forming the largest number of clusters in 
Energy Load Profiles (ELPs) objects, falls into the 
"Specified number of clusters" group, with the K- 
Means method being the most frequently used, 
evenly distributed in the "Unspecified number of 
clusters" group. 


To enhance the performance of a clustering 
method, specific techniques are required to produce 
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optimal results. The most widely used method for the 
"Specified number of clusters" group is the Elbow 
method, accounting for 81%, while in the 
"Unspecified number of clusters" group, the most 
frequently used method involves using radius values 
between data points. The Silhouette Index is a 
commonly utilized method for evaluating clustering 
performance, observed in 29% of the journals 
analyzed. 
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